Generating Synthetic RDF Data with Connected Blank Nodes for Benchmarking
نویسندگان
چکیده
Generators for synthetic RDF datasets are very important for testing and benchmarking various semantic data management tasks (e.g. querying, storage, update, compare, integrate). However, the current generators do not support sufficiently (or totally ignore) blank node connectivity issues. Blank nodes are used for various purposes (e.g. for describing complex attributes), and a significant percentage of resources is currently represented with blank nodes. Moreover, several semantic data management tasks, like isomorphism checking (useful for checking equivalence), and blank node matching (useful in comparison, versioning, synchronization, and in semantic similarity functions), not only have to deal with blank nodes, but their complexity and optimality depends on the connectivity of blank nodes. To enable the comparative evaluation of the various techniques for carrying out these tasks, in this paper we present the design and implementation of a generator, called BGen, which allows building datasets containing blank nodes with the desired complexity, controllable through various features (morphology, size, diameter, density and clustering coefficient). Finally, the paper reports experimental results concerning the efficiency of the generator, as well as results from using the generated datasets, that demonstrate the value of the generator.
منابع مشابه
On Computing Deltas of RDF Knowledge Bases with Blank Nodes
The Semantic Web (SW) is an evolving extension of the World Wide Web in which the content can be expressed not only in natural language, but also in formal languages (e.g. RDF/S) that can be read and used by software agents, permitting them to find, share and integrate information more easily. The semantically structured content is expressed using RDF triples and a set of such triples constitut...
متن کاملBlank Node Matching and RDF/S Comparison Functions
In RDF, a blank node (or anonymous resource or bnode) is a node in an RDF graph which is not identified by a URI and is not a literal. Several RDF/S Knowledge Bases (KBs) rely heavily on blank nodes as they are convenient for representing complex attributes or resources whose identity is unknown but their attributes (either literals or associations with other resources) are known. In this paper...
متن کاملEverything you always wanted to know about blank nodes
In this paper we thoroughly cover the issue of blank nodes, which have been defined in RDF as ‘existential variables’. We first introduce the theoretical precedent for existential blank nodes from first order logic and incomplete information in database theory. We then cover the different (and sometimes incompatible) treatment of blank nodes across the W3C stack of RDF-related standards. We pre...
متن کاملRDFLog: It’s like Datalog for RDF
RDF data is set apart from relational or XML data by its support of rich existential information in the form of blank nodes. Where in SQL databases null values are scoped over a single tuple, blank nodes in RDF can span over any number of statements and thus can be seen as existentially quantified variables. Blank node querying is considered in most RDF query languages, but blank node construct...
متن کاملWell Behaved RDF: A Straw-Man Proposal for Taming Blank Nodes
The RDF language (Resource Description Framework) allows nodes in an RDF graph to be unlabeled – “blank nodes”. While blank nodes and certain other features are convenient for RDF authors, their unrestricted use causes complications to RDF consumers, such as when attempting to compare RDF graphs, which in the general case is as difficult as the graph isomorphism problem. This paper proposes a s...
متن کامل